Prior-free and prior-dependent regret bounds for Thompson Sampling

机译：Thompson sampling的先前自由和先前依赖的后悔限制

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider the stochastic multi-armed bandit problem with a priordistribution on the reward distributions. We are interested in studyingprior-free and prior-dependent regret bounds, very much in the same spirit asthe usual distribution-free and distribution-dependent bounds for thenon-Bayesian stochastic bandit. Building on the techniques of Audibert andBubeck [2009] and Russo and Roy [2013] we first show that Thompson Samplingattains an optimal prior-free bound in the sense that for any priordistribution its Bayesian regret is bounded from above by $14 \sqrt{n K}$. Thisresult is unimprovable in the sense that there exists a prior distribution suchthat any algorithm has a Bayesian regret bounded from below by $\frac{1}{20}\sqrt{n K}$. We also study the case of priors for the setting of Bubeck et al.[2013] (where the optimal mean is known as well as a lower bound on thesmallest gap) and we show that in this case the regret of Thompson Sampling isin fact uniformly bounded over time, thus showing that Thompson Sampling cangreatly take advantage of the nice properties of these priors.

机译：考虑奖励分配具有先验分布的随机多臂匪问题。我们对研究无先验且先验依赖的后悔界限很感兴趣，这与非贝叶斯随机强盗的惯常无分布和依赖分布的界限非常相似。基于Audibert和Bubeck [2009]以及Russo和Roy [2013]的技术，我们首先表明，汤普森抽样附加了一个最佳的先验自由边界，这是因为对于任何先验分布，其贝叶斯后悔都从上方被限制了$ 14 \ sqrt {n K } $。从存在先验分布的意义上说，此结果是无法改善的，因此任何算法都具有贝叶斯后悔，其贝叶斯后悔从下方受$ \ frac {1} {20} \ sqrt {n K} $限制。我们还研究了Bubeck等人[2013]设置背景的先验案例。（已知最佳均值以及最小间隙的下限），并且我们证明了在这种情况下，汤普森抽样的后悔实际上随着时间的推移而均匀地界定，因此表明汤普森抽样可以充分利用这些方法的良好特性先验。

著录项

作者
Bubeck, Sébastien; Liu, Che-Yu;
展开▼
作者单位

展开▼
年度 2013
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Self-accelerated Thompson sampling with near-optimal regret upper bound [J] . Zhu Zhenyu, Huang Liusheng, Xu Hongli Neurocomputing . 2020,第Jul25期

机译：自加速汤普森采样，近乎最佳遗憾的上限
2. Feedback graph regret bounds for Thompson Sampling and UCB [J] . Thodoris Lykouris, éva Tardos, Drishti Wali JMLR: Workshop and Conference Proceedings . 2020,第2期

机译：汤普森采样和UCB的反馈图感到遗憾
3. Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems [J] . Marc Abeille, Alessandro Lazaric JMLR: Workshop and Conference Proceedings . 2018,第12期

机译：线性二次控制问题中汤普森采样的改进后悔边界
4. Prior-free and prior-dependent regret bounds for Thompson Sampling [C] . Bubeck Sebastien, Che-Yu Liu Annual Conference on Information Sciences and Systems . 2014

机译：汤普森采样的先验后验和后验依赖后悔界限
5. Thompson Sampling for Bandit Problems [D] . Liu, Che-Yu. 2018

机译：汤普森抽样问题
6. How many parents regret having children and how it is linked to their personality and health: Two studies with national samples in Poland [O] . Konrad Piotrowski 2021

机译：有多少父母遗憾有孩子以及如何与他们的个性和健康联系起来：两项与波兰国家样本的研究
7. Prior-free and prior-dependent regret bounds for thompson sampling [O] . Che-yu Liu 2013

机译：汤普森采样的先验自由和先验依赖的后悔限制

Prior-free and prior-dependent regret bounds for Thompson Sampling

摘要

著录项

相似文献

相关主题

期刊订阅